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Abstract 

Parameter identification problems are formulated in a probabilistic language, 
where the randomness reflects the uncertainty about the knowledge of the true val- 
ues. This setting allows conceptually easily to incorporate new information, e.g. 
through a measurement, by connecting it to Bayes's theorem. The unknown quant- 
ity is modelled as a (may be high-dimensional) random variable. Such a description 
has two constituents, the measurable function and the measure. One group of meth- 
ods is identified as updating the measure, the other group changes the measurable 
function. We connect both groups with the relatively recent methods of functional 
approximation of stochastic problems, and introduce especially in combination with 
the second group of methods a new procedure which does not need any sampling, 
hence works completely deterministically. It also seems to be the fastest and more 
reliable when compared with other methods. We show by example that it also works 
for highly nonlinear non-smooth problems with non-Gaussian measures. 
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1 Introduction 

In trying to predict the behaviour of physical systems, one is often confronted with 
the fact that although one has a mathematical model of the system which carries some 
confidence as to its fidelity, some quantities which characterise the system may only be 
incompletely known, or in other words they are uncertain. See [27] for a synopsis on our 
approach to such parametric problems. 

Here we want to identify these parameters through observations or measurement of 
the response of the system. Such an identification can be approached in different ways. 
One way is to measure the difference between observed and predicted system output and 
try to find parameters such that this difference is minimised, this optimisation approach 
leads to regularisation procedures [7]. 

Here we take the view that our lack of knowledge or uncertainty of the actual value 
of the parameters can be described in a Bayesian way through a probabilistic model 
[14, 38]. The unknown parameter is then modelled as a random variable — also called 



the prior model — and additional information on the system through measurement or 
observation changes the probabilistic description to the so-called posterior model. The 
second approach is thus a method to update the probabilistic description in such a way 
as to take account of the additional information. 

To be more specific, let us consider the following situation: we are investigating some 
physical system which is modelled by an evolution equation for its state: 



where u(t) G U describes the state of the system at time t G [0, T] lying in a Hilbert space 
U (for the sake of simplicity) , A is an operator modelling the physics of the system, and 
/ G Li* is some external influence (action / excitation / loading). The model depends 
on some parameters p G V, and by q we denote that component of the parameters p 
which we are uncertain about and would thus like to identify the actual value. 

Now assume that we observe a function of the state Y(u(q), q), and from this obser- 
vation we would like to identify the corresponding q. This is called the inverse problem, 
and as the mapping q \— > Y{q) is usually not invertible, the inverse problem is ill-posed. 
Embedding this problem of finding the best q in a larger class by modelling our know- 
ledge about it with the help of probability theory, then in a Bayesian manner the task 
becomes to estimate conditional expectations, e.g. see [14, 38] and the references therein. 
The problem now is well-posed, but at the price of 'only' obtaining probability distribu- 
tions on the possible values of q, which now is modelled as a Q- valued random variable 
(RV). Predicting what the measurement Y(q) should be from some assumed q is com- 
puting the forward problem. The inverse problem is then approached by comparing the 
forecast from the forward problem with the actual information. 

Since the parameters of the model to be estimated are uncertain, all relevant inform- 
ation may be obtained via their stochastic description. In order to extract information 
from the posterior most estimates take the form of expectations w.r.t. the posterior. 
These expectations — mathematically integrals, numerically to be evaluated by some 
quadrature rule — may be computed via asymptotic, deterministic or sampling methods. 
In our review of current work we follow our recent reports [35, 31]. 

One often used technique is a Markov chain Monte Carlo (MCMC) method [21, 10], 
constructed such that the asymptotic distribution of the Markov chain is the Bayesian 
posterior distribution. This can be then sampled by letting the Markov chain run for a 
sufficiently long time, although the samples are not independent in this case. With the 
intention of accelerating the MCMC method some authors [22, 16, 32, 17] have intro- 
duced stochastic spectral methods into the computation. Expanding the prior random 
process into a polynomial chaos (PCE) or a Karhunen-Loeve expansion (KLE) (e.g. 
[24]), the inverse problem becomes an inference on the weights of the Karhunen-Loeve 
modes. Pence et al. [32] combine polynomial chaos theory with maximum likelihood 
estimation, where the parameter estimates are calculated in a recursive or iterative 
manner. Christen and Fox [6] have applied a local linearisation of the forward model to 
improve the acceptance probability of proposed moves, while in [2, 20, 18, 39] collocation 
methods are employed as a more efficient sampling technique. 

The approaches mentioned above require a large number of samples in order to 
obtain satisfactory results. While showing some results in this direction, the main idea 
here is to do the Bayesian update directly on the polynomial chaos expansion (PCE) 
without any sampling [31, 35, 27]. This idea has appeared independently in [3] in a 
simpler context, whereas in [36] it appears as a variant of the Kalman filter (e.g. [15]). 
A PCE for a push-forward of the posterior measure is constructed in [28]. 

From this short overview it becomes apparent that the update may be seen abstractly 
in two different ways. Regarding the uncertain parameters 
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where the set of elementary events is f2, 21 a cr-algebra of events, and P a probability 
measure, one set of methods performs the update by changing the probability measure 
P and leaving the mapping q(oj) as it is, whereas the other set of methods leaves the 
probability measure unchanged and updates the function q(uj). In any case, the push 
forward measure q*F on Q is changed from prior to posterior. 

The organisation of the paper is as follows: in Section 2 we review the Bayesian 
update and recall the link between the conditional measure and conditional expecta- 
tion. This allows to recover a Kalman filter like update for the RV via the conditional 
expectation. In the following Section 3 different ways of numerically computing the 
Bayesian update are indicated. The resolution of the forward problem in the context 
of stochastic discretisation procedures is outlined in Section 4. The numerical examples 
for such parameter identification procedures, contained in Section 5, are representative 
of some linear and nonlinear problems in structural and continuum mechanics. 

2 Bayesian Updating 

In the setting of Eq. (1) let us pose the following problem: Some components — let us 
denote these by q — of the parameters p £ V are uncertain. To be more specific, assume 
that q £ Q are elements of some vector space. By making observations z k at times 
< t\ < ■ ■ ■ < tfc • • • G [0, T] one would like to infer what they are. But we can not 
observe the entity q directly — like in Plato's cave allegory we can only see a 'shadow' of 
it, formally given by a 'measurement operator' 



at least this is our model of what we are measuring. Frequently the space y may be 
regarded as finite dimensional, as one can anly observe a finite number of quantities. 
Usually the observation will deviate from what we expect to observe even if we knew the 
right q as Eq. (1) is only a model — so there is some model error e, and the measurement 
will be polluted by some measurement error e. Hence we observe z k = y k + e + e. From 
this one would like to know what q and u(tk) are. For the sake of simplicity we will only 
consider one error term Zk = y k + £ which subsumes all the errors. 

The mapping in Eq. (3) is usually not invertible and hence the problem is called ill- 
posed. One way to address this is via regularisation (see e.g. [7]), but here we follow a 
different track. Modelling our lack-of-knowledge about q and u(tk) in a Bayesian way [38] 
by replacing them with a Q- resp. ^-valued random variable (RV), the problem becomes 
well-posed [37]. But of course one is looking now at the problem of finding a probability 
distribution that best fits the data; and one also obtains a probability distribution, not 
just one pair q and u{tk). Here we focus on the use of a linear Bayesian approach [11] 
in the framework of 'white noise' analysis. 

The mathematical setup then is as follows: we assume that Q is a measure space 
with c-algebra 21 and with a probability measure P, and that q : Q — >• Q and u : I? —> U 
are random variables. For simplicity, we shall also require Q to be a Hilbert space where 
each vector is a possible realisation. This is in order to allow to measure the distance 
between different q's as the norm of their difference, and to allow the operations of linear 
algebra to be performed. 

2.1 Bayesian updating of the measure 

Bayes's theorem is commonly accepted as a consistent way to incorporate new knowledge 
into a probabilistic description [14, 38]. The elementary textbook statement of the 
theorem is about conditional probabilities 
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where I q is some subset of possible g's, and M y is the information provided by the 
measurement. This becomes problematic when the set M y has vanishing probability 
measure, but if all measures involved have probability density functions (pdf), it may 
be formulated as ([38] Ch. 1.5) 

n q (q\v) = —P q (l), (5) 
w 

where p q is the pdf of q, p(y\q) is the likelihood of y = Y(q) given q, as a function of 
q sometimes denoted by L{q), and w is a normalising factor such that the conditional 
density 7T g (-|y) integrates to unity. These terms are in direct correspondence with those 
in Eq. (4). Most computational approaches determine the pdfs [23, 37, 16]. 

However, Kolmogorov already defined conditional probabilities via conditional ex- 
pectation, e.g. see [4]. Given the conditional expectation E (-\M y ), the conditional prob- 
ability is easily recovered as F(I q \M y ) = E(x/ |My), where xi q is the characteristic 
function of the subset I q . 



2.2 Conditional expectation 

The easiest point of departure for conditional expectation in our setting is to define it not 
just for one piece of measurement M y , but for sub-cr-algebras 6 C 21. The connection 
with an event M y is to take 6 = cr(Y), the c-algebra generated by Y. 

For RVs with finite variance — elements of S := L2(.!?,2l, P) — the conditional expect- 
ation E(-|G) is defined as the orthogonal projection onto L2(i?,G,P). It can then be 
extended as a contraction onto all L p (f2, 21, P) with p > 1, e.g. see [4]. 

Let us define the space £2 := Q (g) S of Q-valued RVs of finite variance, and set 
■2n '■= Q® S n with S n := L2{fl, &, P) for the Q-valued RVs with finite variance on the 
sub-cr-algebra (3, representing the new information. 

The Bayesian update as conditional expectation is now simply formulated: 

E(q\a(Y)) = P<s n (q) = arg mm^Jq - q\\%, (6) 

where Pg n is the orthogonal projector onto J2 n . Already in [15] it was noted that the 
conditional expectation is the best estimate not only for the loss function 'distance 
squared', as in Eq. (6), but for a much larger class of loss functions under certain 
distributional constraints. However for the above loss function this is valid without any 
restrictions. 

Requiring the derivative of the loss function in Eq. (6) to vanish — equivalently re- 
membering from elementary geometry that the line to the closest point is perpendicular 
to the approximating subspace — one arrives at the Galerkin orthogonality conditions 

Vg€^ n : (q-E(q\a(Y)),q) s = 0. (7) 

To continue, note that the Doob-Dynkin lemma [4] assures us that if a RV like 
E (q\a(Y)) is measurable w.r.t. o~(Y), then K(q\a(Y)) = ip(Y) for some measurable 
ip ^ Lo(y; Q). More precisely one should write E (q\a(Y)) = tp(Y(q)) = ip o Y o q. 

Hence £} n = span{</>o Y og E £} | (ft £ Lo(y-, Q)}, where Lo(y; Q) is the vector space 
of measurable maps from y to Q. In particular one sees that E (q\a(Y)) is of this form. 
In this light the task of finding the conditional expectation may be seen as rephrasing 
Eq. (6) as: find ip S ^0(3^ Q) such that 

tft = arg min 0gio(y . Q) ||g -cftoYo q\\%. (8) 

Then q a := ip(y) = E((7|y) is called the analysis, assimilated, or posterior value, incor- 
porating the new information. 

We would like to emphasise that it is the vector space setting of Q and y which has 
made this well-known formulation possible [15], and it will also allow for easy numer- 
ical computation. To work with measures as in Eq. (4) is cumbersome, as probability 
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measures are on the intersection of the unit sphere and the positive cone in the space of 
signed finite measures. Similarly, in Eq. (5) the pdfs are in the positive cone of L\(Q) 
and on the unit sphere in L\(f2), as they have to integrate to unity. A bit easier would 
be to work with RVs which are in a metric space, the so-called Frechet-type conditional 
expectation then minimises the metric distance squared; but the Hilbert space setting 
is certainly the simplest instance of this, and we will adhere to it here. 

In case the g's are not without constraints, or not in a vector space, then they should 
be mapped to such quantities. For example, if q is a diffusion tensor field, then it has 
to be symmetric and positive definite. The symmetric tensors are of course a subspace, 
but the sub-manifold of positive definite ones is not a subspace, nor is this subset closed, 
making a minimisation as in Eq. (6) ill-defined. However the symmetric positive definite 
tensors can be given the structure of a Lie group and a Riemannian manifold [1], and 
then distance is measured as the length of a path along a geodesic. Furthermore, the 
associated Lie algebra — the tangent space at the neutral element of the Lie group — is in 
one to one correspondence with the geodesies; hence one can play everything back to a 
vector space. A simple case of this are positive scalars; through the logarithm they are 
transformed into a vector space without constraints. 

2.3 The linear Bayesian update 

As we work in a vector space, we make an approximation to simplify the computations 
by replacing Lo(3 ; ; Q) above by Jif(y, Q) C Lo(y; Q), the subspace of linear continuous 
maps. The minimisation Eq. (8) is then translated to: find K G (3^, Q) such that 
[15, 19] 

K = arg mm He ^ { y Q) \\q - Ho Y o q\\%, (9) 

and we set E (q\o~(Y)), := K oY o q, a linear in Y approximation to E (q\a(Y)). As the 
projection is now onto the smaller space ^ := spanji? oYoq£j2\HE j£f (y, Q)} C 
we are not using all the information available. Hence the error — the value 
of the functional being minimised — will remain larger, but the computation is simpler. 
The optimal K is not hard to find by taking the derivative in Eq. (9) w.r.t. the linear 
map H (see e.g. [15] and [19] Ch. 3.2 Thm. 4.711), equivalently one may rewrite the 
variational Galerkin orthogonality condition Eq. (7), for the final result see Eq. (10). 

In the case of prior information represented by the forecast RV qt , which results in 
the measurement forecast yj = Y(qt), the projection is adjusted by an affine shift to 
[15,19] 

q a = q f + K(z - y f ), with K = C q , y (C y + C e )~\ (10) 

here the operator K is also known as the Kalman gain. This includes the errors e 
assumed independent of q with zero mean and covariance operator C £ , where C y : = 

E (Y(q, u) Y(q, u)\ and C q ^ y = E (q Y(q, u)j , where for any RV like q for the sake 

of brevity we set q := E (q) such that q := q — q is the zero- mean part. In case C y + C £ 
is not invertible or close to singularity, its inverse in Eq. (10) should be replaced by the 
Moore-Penrose pseudo-inverse. This update is in some ways very similar to the 'Bayes 
linear' approach [11]. If the mean is taken in Eq. (10), one obtains the familiar Kalman 
filter formula [15] for the update of the mean, and one may show [31] that Eq. (10) also 
contains the Kalman update for the covariance. 

Stated differently, in the situation of Eq. (10) q a is the orthogonal projection of q 
onto the subspace £l a = &f + which is generated jointly by the prior information 
and the measurement. This may be written as J2 a = £if + = © where the 
information gain or innovation =2, is the part of orthogonal to the last orthogonal 
sum reflecting the terms in Eq. (10). 

Each new measurement enlarges the space we project onto, and thus constrains 
the error q — q a — which is in the orthogonal complement — further. In case the space 
generated by the measurements is not dense in £2 a residual error will thus remain, as 
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the measurements do not contain enough information to resolve our lack of knowledge 
about q. Anyway, finding q is limited by the presence of the error e, as obviously the 
error influences the update in Eq. (10). If the measurement operator is approximated 
in some way — as it will be in the computational examples to follow — this will introduce 
a new error, further limiting the resolution. 

3 Numerical realisation 

In the instances where we want to employ the theory detailed in the previous Section 2, 
the spaces U and Q are usually infinite dimensional, as is the space S = L2(f2). For an 
actual computation they have to be discretised or approximated by finite dimensional 
spaces. In our examples we will chose finite element discretisations and corresponding 
subspaces. Hence let Qm '■= span {g m : m = 1,...,M} C Q be an M-dimensional 
subspace with basis {Qm}m=l- ^n element of Qm will be represented by the vector q = 
[g 1 , . . . , q M ] T G M M such that Ylm=i Q m 6rn £ Qm- The space of possible measurements 
can usually be taken to be finite dimensional, whose elements similarly are represented 
by a vector of coefficients z 6 M. R . 

On i M , representing Qm, the minimisation in Eq. (9) is translated to 

K = arg mm Hm MxR\\q - H(Y(q))\\ 2 Q , (11) 

where the mapping induced by Y has been denoted by boldface Y, and the norm ||q||q 
results from the inner product {q\\q2)Q '■= ^ {qi T QQ2) with Q mn = (qm\qn)g, the 
Gram matrix of the basis. We will later choose an orthonormal basis, so that Q = I is 
the identity matrix. Then the update corresponding to Eq. (10) is 

q a = q f + K(z - y f ), with K = C q , y {C y + C £ )-\ (12) 

Here the covariances are naturally C ?i2/ := E (q y T ^ = E (q ® y), and similarly for C y 
and C £ . 

3.1 Markov chain Monte Carlo 

We shall shortly sketch the Markov chain Monte Carlo (MCMC) method for the sake of 
completeness [21, 10], as it will be used on some examples. It is a method which changes 
the underlying probability measure according to Eq. (4) and Eq. (5) in Subsection 2.1. 

The idea is to construct a Markov chain with an equilibrium distribution which 
corresponds to some desired distribution. In our case this will be the posterior or 
conditional distribution. The method has the distinct advantage that it does not need 
target probabilities but only ratios of target probabilities to work. This means that the 
normalisation constant w in Eq. (5) does not have to computed at all — this would involve 
a cumbersome high-dimensional integration — as it cancels out. Having such a Markov 
Chain, all that is required to obtain samples from the posterior is to run the Markov 
chain for a sufficiently long time. As the posterior is the equilibrium distribution, one 
is indeed sampling the posterior. 

The Metropolis algorithm to achieve the construction is described very quickly for 
the simplest case [21]: assume that the datum y has been observed, and that the range 
of possible g's has been quantised into X equal bins with representatives {q^}f = i, to 
each (/£ assign the number := L(q^)p q (q^) — the product of likelihood and prior pdf. 
The representatives of the bins q^ will be the states of the Markov chain, which will be 
denoted by {$k}k=l,... i n the order visited at step k = 1, 

For each step A; = 1,2,... do the following: 

if currently in state % = qt, then 

1. pick any state q^(C randomly with probability 1/(X — 1), 
this is the proposal; 
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2. let a = min{l, p^/pf}, this is the acceptance probability; 



3. accept with probability a, 

i.e. pick a sample of a uniformly distributed RV U € [0, 1], and 
if U < a then set = 
else set ft+i = 

As the acceptance probability in step 2 only involves ratios of the the — hard to 
compute — normalising constant w in Eq. (5) is indeed not needed. 

The simplicity of the formulation and possibility to correctly estimate the posterior 
without any approximation are certainly the main advantages of the method. We should 
warn that although the formulation is so simple, to use the method efficiently and 
correctly may not be so simple. 

First, MCMC is a Monte Carlo method, which means that estimates converge only 
slowly with increasing number of samples. Second, the samples — coming from a Markov 
chain — are obviously not independent. This makes estimating any statistic other than 
the mean (e.g. the variance) difficult. Also usual statistical formulas for the spread or 
accuracy of the estimates assume independent samples and are not applicable. Third, 
the sequence of visited states is not even stationary as the equilibrium distribution is 
only the asymptotic distribution. This means that one has to wait until k is large 
enough before actually starting to sample, this is called the burn-in period. Usually 
estimating that the burn-in period is over has to be tested by tests on the stationarity 
of the sequence. And finally, if the likelihood function and prior pdf differ very much, 
the acceptance probabilities a in step 2 may be very low — meaning that practically the 
chain has got stuck in some state and does not move on. 

Here we have used one of the simplest variants, where the underlying transition 
probabilities of the Markov chain are all equal. For extensions which partly try to 
alleviate the problems alluded to above and thorough explanations the reader should 
refer to the literature cited and the references therein. 

Now, the two methods to be considered in the following are based on the update 
via conditional expectation as outlined in Subsection 2.2, and they both employ the 
linear approximation from Subsection 2.3. This means that in contrast to MCMC they 
make an additional approximation error, or stated differently, they do not use all the 
information availabe for the update. After the discretisation of the space Q this boils 
down to using Eq. (12) in some way on Qm- How this is done differs in the two methods. 

3.2 Ensemble Kalman Filter 

The ensemble Kalman filter (EnKF) is a Monte Carlo method, a sampling interpretation 
of Eq. (12), for details see e.g. [8, 9]. The basic form is actually simple to describe and 
is a good preparation for the PCE-based form in the next Subsection 3.3. 

Pick Z independent samples {q/(u; 2 )}f =1 according to the probability measure P. 
Now Eq. (12) holds for the the RV q : fl —> Qm, i«e. for all a; € 17 (a.s. — almost 
P-surely). One takes this to mean that it holds for each cv z G J?. Arranging the 
the samples in a matrix Qf := [qt(LJi), . . . , qf(uz)], and similarly for the forecasts 
Yf := [yf(oJi), . . . ,yf(u>z)] and measurements Z, Eq. (12) is now formulated in matrix 
notation as 



where K is the same as in Eq. (12). 

But the variances needed to compute K have to be estimated from the sample. This 
simply takes the form 



Q a = Q / + K(Z-Y / ) 



(13) 



C, 



1 




(14) 



C 



Z - 1 

1 




(15) 



Z - 1 
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The normalisation terms [Z — 1) 1 do not really have to be used as they cancel in the 
computation of K. With the estimated means 

1 Z 1 Z 

<?/ = z^qfi^z) and V/= «2^V/(w*), 

Z=l 2=1 

the terms in Eq. (14) and Eq. (15) are 

Q f = Q f -q f l% (16) 
Y7 = Y f -y f l T z , (17) 

where lz is a vector of ones of size 

This method is a Monte Carlo method, hence it also suffers from the slow convergence 
with increasing Z. On the other hand it is fairly simple to implement, all it needs are 
samples. In practice the number of samples is often low, and then special care is needed 
when computing the covariances — whose rank can not be higher than the number of 
samples — and the Kalman gain K, see [8, 9]. Another closely connected problem is 
the possible underestimation of the posterior sample variance with low sample numbers. 
Still, this method is currently a favourite for problems where the computation of the 
predicted measurement yf(co z ) is difficult or expensive. It needs far fewer samples for 
meaningful results than MCMC, but on the other hand it uses the linear approximation 
inherent in Eq. (12). 

3.3 Polynomial chaos projection 

In Subsection 3.2 the Eq. (12) was discretised in the variables u G I? through sampling. 
Here another track is taken, relying on the tensor product representation of the RVs 
=S = Q ® S from Subsection 2.2. The first factor already was discretised to Qjy C Q, 
and here an explicit discretisation of the second factor is used. In a numerical sense the 
sampling in Subsection 3.2 is of course also a discretisation. As for Qm we pick a finite 
set of independent vectors in S. As S = L/2(f2), these abstract vectors are in fact RVs 
with finite variance. Here we will use Wiener's polynomial chaos expansion (PCE) as 
basis [13, 24], this allows to use Eq. (12) without sampling, see [31, 35, 27], and also 
[36, 3]. 

The PCE is an expansion in multivariate Hermite polynomials [13, 24]; denote by 
H a (0) = n?ceN ^afc(^fc) G <S the multivariate polynomial in standard independent Gaus- 
sian RVs 6(uj) = (9i(oj), . . . , 9f.(u}), . . . )keN, where hj is the usual univariate Hermite 
polynomial, and a = (a\, . . . , a*,, . . . )kem £ JV := Nq i s a multi-index of generally 
infinite lenght but with only finitely many entries non-zero. As ho = 1, the infinite 
product is effectively finite and always well-defined. 

The Cameron- Martin theorem assures us [13] that the set of these polynomials is 
dense in S = L2(f2), and in fact {H a / \J (a!)} a6 _/v is a complete orthonormal system 
(CONS), where a\ := IlfceN^fcO ^ s ^ ne P r °duct of the individual factorials, also well- 
defined as except for finitely many k one has a^! = 0! = 1. So we may assume that 
q(uj) = J2aeAf q a H a {0{ijj)), and similarly for z and y. In this way the RVs are expressed 
as functions of other, known RVs, and not through samples. 

The space S may now be discretised by taking a finite subset J C M of size J = |J7~|, 
and setting Sj = span {H a : a £ S} C S. The orthogonal projection Pj onto Sj is 
then simply 

Pj :Qm®S3 ^ Yl Q aH <* G Qm Sj - ( 18 ) 

ct<=Af aej 

We then take Eq. (12) and rewrite it as 

q a = q f + K(z-y f )= (19) 



E 



t&A/" aGAf 
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Projecting both sides of Eq. (20) is then very simple and results in 

£ qZH a =^(l? + K (z a - yj)) H a . (21) 

Obviously the projection Pj commutes with the Kalman operator K and hence with its 
finite dimensional analogue K. One may actually concisely write Eq. (21) as 

PjQa = PjQf + PjK{z -y f ) = Pjq f + K(Pjz - Pjy f ). (22) 

Elements of the discretised space J2m,j = Qm <8> $J C thus may be written as 
Em=i Eaej9 a ' m Pmffc<- The tensor representation q := (q a ^ m ) = Y^aejl™ ® e "' 
where the e a are the unit vectors in M. J , may be used to express Eq. (21) or Eq. (22) 
succinctly as 

q a = q/ + K(z- y/ ), (23) 

where K = K I with K from Eq. (12). Hence the update equation is naturally 
in a tensorised form. This is how the update can finally be computed in the PCE 
representation without any sampling [31, 35, 27]. 

It remains to say how to compute the Kalman gain — or rather the covariance 
matrices — in this approach. Given the PCEs of the RVs, this is actually quite simple 
as any moment can be computed directly from the PCE [24, 31, 35]; remembering that 
q° = IE (g), due to the orthogonality of the H a one has for the variance 

C y « C Pjy = E((Pjy)®(Pjy))= £ ( a \)y a ®y a , (24) 

c q , y = («0<z a ®v a « E («0 9 a ®v a . (25) 

As was already remarked Subsection 3.2, the covariance matrix can not have higher 
rank than the number of tensor products in the sum. And certainly any such truncation 
as is implicit in the projection operator Pj Eq. (18) or explicit as in Eq. (24) and Eq. (25) 
will reduce the variance. The difference to the procedure in Subsection 3.2 is that here 
the q a and y a are components of the orthogonal basis H a . The main 'difficulty' to 
apply the the update as described in this section is the need to have a PCE of qf{co) 
and especially of the forecast measurement yf{oo) = Y{qf(uj)). How this might be done 
will be sketched in the next Section 4. 

One should not forget to draw attention to the fact that once such an expansion is 
available, evaluating an expression such as ^2 a& j y a Ha(u) for some specific to £ Q is 
computationally relatively cheap, and this is an approximation to a sample of yf{uj)— 
it is precisely Pjyf(oS). Hence this can be used wherever samples are needed, and this 
device has actually been employed, i.e. in the MCMC method [22, 16, 32, 17] as described 
in Subsection 3.1, or in the EnKF method [18] described in Subsection 3.2. 

4 The forward problem 

As was noted already in Section 3, the spaces IA and Q are usually infinite dimensional, 
as is the space S = L2 (.!?). Similarly to the discretisation described there, the space 
IA has to be discretised as well. In an analogous fashion, choosing an iV-dimensional 
subspace Un = span {v n : n = 1,...,N} C U with basis {v n }^ =1 . An element 
of Un will similarly be represented by the vector u = [u 1 , . . . , u ] T € 1^ such that 
E% =1 u n v n eU N . 

The state equation may then be discretised by inserting the 'ansatz' that the solution 
is from Un by assuming in Eq. (1) that u = ^2 n u n v n . We obtain equations for the 
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coefficients u n for example by projecting in general Galerkin manner Eq. (1) onto IAn\ 
most straightforward is to use the basis {v n }n=i a ^ so f° r the projection: 

Vfc: (v k \^ t ^u n (t)v n ) u + {v k \A(q-Y,u n (t)vn))u = (vk\f(q-,t))u, (26) 

n n 

where we have now only included the dependence on the parameter q £ Q which has to 
be identified. Eq. (26) may in standard manner succinctly be written as an equation on 
R N : 

~u(t) + A{q-u) = f{q-,t). (27) 

The measurement operator will, in combination with the developments in Section 3, be 
denoted as (see Eq. (3)) 

y f = Y(u(t k ),q k _ 1 ) = Y(q k _ 1 ). (28) 

The unknown parameter is modelled as a RV — see Section 2, which in the discrete 
setting of Section 3 means that Eq. (27) reads 

d 

— u(oj) + A(q(u); u(oj)) = f(q(u)) F- a.s. in u € f2. (29) 
ot 

For MCMC (Subsection 3.1) and EnKF (Subsection 3.2) or any other Monte Carlo 
or collocation method, this equation has to be solved for each sample or collocation 
point oj z and q{ui z ), to obtain u(co z ), and with that predict the measurement yf{oj z ) = 
Y(u(oj z ), q(oj z )). Obviously, this may be computationally quite costly. As remarked at 
the end of Subsection 3.3, if one can compute a PCE of yf(uj) = ^2 a <=j yJH a {B(oj)), 
this may be used instead to great advantage, as it will be usually much less work to 
evaluate instead of going via Eq. (29). 

For the linear Bayesian update in Subsection 3.3 via PCE this comes natural, but 
of course this means that Eq. (29) has to be solved so that this is possible. Just as the 
parameter was represented in Subsection 3.3 via its truncated PCE Eq. (18) q(uj) = 
Ylaej Q a Ha{6(,w)), or through the tensor of coefficients q = J2 a eJ q a ® e<x ■, the same 
is assumed by an 'ansatz' for the solution to Eq. (29) u(uj) = Ylaej ua H a {6{oSj), 
represented through u = ^2 ae j u a ® e a . 

Inserting this into Eq. (29) and projecting with Pj Eq. (18) onto Sj, one obtains 
the equations for u(t) G M. N <S>1R J through Galerkin conditions with the basis {H a } a& j 
analogous to Eq. (26): 

^u + A(q;u) = f(q), (30) 

with the obvious interpretation of the terms. The same procedure applied to Eq. (28) 
results in yy = Y(u(ifc); qfc_i). Identifying qj with q^-i at step k of the updating, all 
the terms needed to use the update Eq. (23) are present. 

Let us remark that the tensor product representation like the one employed for the 
state u in Eq. (30) may be extended [27] to all entities in Eq. (30). Low-rank approxim- 
ations to those entities in tensor representation then become possible in different guises, 
as model reduction [5], as so called 'generalised spectral decomposition' [30], during the 
solution process progressively as 'proper generalised decomposition' [29], or in the iter- 
ation as approximate, perturbed, or compressed iteration [12, 26], and this may lead to 
considerable numerical savings. As Eq. (23) is in such a format, these savings carry over 
to the Bayesian update. 



5 Numerical examples 

A comparison of the different updating methods is outside the scope of this paper and will 
be presented elsewhere. Here we rather want to present some examples of identification 
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procedures which illustrate how the methods work. Note that these examples are not 
intended to describe real world applications but are rather set up to illustrate the class of 
full and linear Bayesian identification methods. Specifically, the measurement operation 
will not be preformed on a real system, but will also be simulated through computation. 
This has the advantage that we know what the 'truth' is, i.e. the value of the parameter 
field with which we simulate the measurement, so that one can compare this with the 
estimate from the identification. 

These examples will be both linear and nonlinear problems described through partial 
differential equations, namely linear and nonlinear diffusion and elasto-plastic behaviour. 
Hence the Eq. (1) is here a partial differential equation, and the 'parameter' to be 
identified will be some coefficient fields in the equation, either the diffusion coefficient, 
or the shear modulus. Let us denote such a generic random field as q(x,u), where now 
x 6 <S is a point in the spatial domain C 1^. More precisely the spatial function 
q(-,uj) is the parameter to be identified. We assume that this field is transformed in 
such a way that it is without constraints in some vector space Q. 

To start the numerical computations, one more step is needed to be able to effect- 
ively describe the random fields. What one needs for the developments described in 
Subsection 3.3 is the PCE of the random field q(x,u) = YlaeN Q a ( x )Hct(6(^))- The 
spatial coefficient functions are given by simple projection q a (x) = E (q(x, -)H a (6(-))). 

The computational problem with this approach is that the PCE is completely general 
and is defined without any reference to the random field q(x,u;). This means that an 
excessive number of RVs 0i(uj), . . . , 9k(uj), . . . may be needed to give an accurate enough 
approximation when the above PCE is truncated to some a € J C M . 

One way to have an accurate description with a fairly small number of RVs is to 
use the Karhunen-Loeve expansion (KLE), e.g. see [24]. With the covariance function 
of the random field c q (x\,X2) ■= IE (q(xi, -)q(x2, •)) — f° r the sake of simplicity we will 
only consider scalar random fields — one computes the eigenvalues and -functions of the 
Fredholm eigenvalue problem with the covariance function as kernel: 

c q (xi,X2)qj(x2) dx2 = Xjqj(xi), (31) 

which results in the spectral decomposition c q (xi,X2) = J2j ^jQj( x i)lj( x 2)- 
The Karhunen-Loeve expansion then is 

q(x,u) = q{x) + ^2y/\j£j(u)qj(x), (32) 
3 

where the non-negative eigenvalues Xj and orthonormal eigenfunctions qj (x) are usually 
ordered in a decreasing sequence, and the zero-mean unit-variance uncorrelated (i.e. 
orthonormal) RVs are given through a simple projection of the random field 



= ~ 7v= / w ) - Q( x ))Qj( x ) dx. (33) 

As the variance of each term in the KLE Eq. (32) is Xj — the total variance is of := 
J2j -\? = Jtf Cq( x , x) dx — it is usually truncated after say M terms so that the residual 
variance of := J2j>M 1S small. 

The field q(x, u) is then approximated or represented by 

q{x,uj) = q(x) + a c ^ €j(w)qj(x), (34) 

j<M 

which has the same mean q(x) as the original field Eq. (32) and covariance c q (xi, X2) = 
°"c Sj<j\/ ^jQj( x i)Qj( x 2)- If the correction factor a c is chosen as a c = \J of] (of — cr%), 
the approximate field Eq. (34) will also have the correct total variance — only distributed 
on the modes qj(x) with j < M. 
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Through the KLE the random field is best approximated in variance with the least 
number of RVs. For further computation the RVs are sometimes unwieldy, so they 
in turn may now be expanded in a PCE 

= E ^H a (0 M (uj)), (35) 

where we only need M isonormal RVs Om(w) = (Oi(u), . . . ,0m(w))- To become com- 
putationally viable, the series Eq. (35) will have to be reduced to a finite sum, usually 
by choosing a finite subset Jj C NQ f . One criterion could of course be again to choose 
those a which give the highest contribution. By doing this one will obviously decrease 
the variance, and this may be compensated by using an additional scaling factor u,- 
computed similarly to a c above. For the sake of simplicity we shall avoid doing this 
here. Inserting the thus truncated series into Eq. (34) gives 

q(x,u) = q(x) + o c Y, Z,fH a (0 M {u))qj(x). (36) 

j<M aejj 

Let J := IJj Jj, set = for a € J\\J k ^j Jk, and let f*(x) := a c Zj< M \ ^'^(.r) 
and £°(x) = q(x); then the truncated PCE of q(x,uS) in M Gaussian RVs is 

q(x,u;)=^2 C(x)H a (e M (u;)). (37) 

Depending on the specific situation, either the expression Eq. (34) or Eq. (37) can be 
used in an actual computation. 

5.1 Examples with MCMC updating 

Our study will start with a relatively simple system and then progress to more difficult 
cases, illustrating the Markov chain Monte Carlo (MCMC) methods of Subsection 3.1. 
Let us recall that in that case the posterior is described by changing the distribution of 
the underlying RVs. 

We start with a simple model of steady state heat transfer [16], where the energy 
balance equation leads to 

— div(K(x,uj)Vu(x, u))) = f(x,oj), a.e. x E Q C M 2 (38) 

where both the heat conductivity k and the right hand side / may be considered as 
random fields over a probability space Q. Thus Eq. (38) is required to hold almost surely 
in Lj, i.e. P-almost everywhere. For the sake of simplicity the conductivity k(x, oj) is taken 
to be a scalar valued random field, although a conductivity tensor would be a more 
appropriate choice. First we will not let k depend on u and Eq. (38) represents linear 
model. To the balance equation one has to add appropriate Dirichlet and Neumann 
boundary conditions, given as a a prescribed temperature no = 20° C and a heat flux 
v x = 100 Win" 2 on the left side and zero flux on top and bottom respectively, as shown 
in Fig. 1. This completes the concrete description of the abstract Eq. (1), where here 
there is no time dependence and thus the time derivative terms vanish. 

After spatial discretisation by a standard finite element method into 120 elements 
(see Fig. 1), the problem is further discretised with the help of random variables as 
described in Section 4. Following this the problem reduces to the form given by Eq. (30) 
without the first term as mentioned already. The system in Eq. (30) is then solved via 
the methods given in [5, 29, 26]. 

The conductivity parameter k(x, lj) is first considered in two different scenarios: first 
as a simple random variable — i.e. no dependence on x G Q — and then as a random field. 
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v x = 100 Wm" 2 w(x°)=20°C 




Figure 1: Experimental setup. 



5.1.1 Conductivity as a simple random variable 

We take the thermal conductivity n as independent of the coordinates x 6 Q, and for 
simplicity it is we assume it as an a priori normally distributed variable Kf = q* with 
mean value E (qj) = 2 Wm _1 K _1 and standard deviation 0.3 Wm — 1 K~ 

The measurements will be simulated with a 'true' value of Kt = 2.5 Wm K~ . They 
are performed in each node of the finite element mesh; and further polluted at each node 
by independent centred Gaussian noise with a standard deviation of a e = 10°C. 




Figure 2: Comparison of prior and posterior pdf and the likelihood function. 

In this particular example the identification of k is done in a fully Bayesian manner 
(see Eq. (4)) with the help of a MCMC procedure with 100000 samples as described 
in Subsection 3.1. In Fig. 2 we display the shape of the prior, the likelihood function, 
and the posterior probability density function (pdf), and compare it to the truth. We 
see that the mean and mode of the posterior — both are often taken as single point 
estimates — when compared to the prior have moved in the direction of the truth; and 
the variance of the posterior (signifying the uncertainty about the true value) is less 
than that of the prior. Another frequent single point estimate is the maximum of the 
likelihood function — the well known maximum likelihood estimate -Lmax(ft) — which may 
be seen as also being close to the truth. One may also observe that these (and other) 
point estimates give only an incomplete picture, as they will not contain information on 
the residual uncertainty. The Bayesian identification procedure on the other hand yields 
a complete probability distribution which informs also about the residual uncertainty. 

The MCMC method performs very well as shown, it is computationally very expens- 
ive. It requires the calculation of the model response for a large number of samples, 
each time solving a FE-system with a different material parameter. In order to improve 
the performance [23, 16] we compute a polynomial chaos approximation of the model 
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Figure 3: Comparison of the computational times of pure MCMC and PCE based 
MCMC methods versus the estimate c a . 

response by a stochastic Galerkin procedure. Then the integration of the posterior dis- 
tribution may be realised by sampling the PCE as alluded to at the end of Section 4. 
This significantly accelerates the update procedure as shown in Fig. 3, where the estim- 
ate Cfj := <r(L max («;))/E (L max (n))) is plotted versus computational time for both the 
pure and the PCE based MCMC approach. Clearly the PCE approximation for the 
same accuracy of estimate runs much quicker than the pure MCMC procedure. 




Figure 4: Polynomial chaos approximation of the model response in one node of com- 
putational domain 

The dependence of the PCE on the polynomial order is investigated in Fig. 4 where 
the accuracy of the temperature approximation in one node of the domain is plotted for 
different polynomial orders. Comparing the approximated solution with the reference 
one obtained by MCMC with 100000 samples one concludes that only the fifth order 
approximation can be safely used. In this case both responses match. Otherwise, for 
lower values of the polynomial order the mean value is estimated correctly (the crossing 
point) but not the higher order moments (the lines do not match). We try to quantify 
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Figure 5: Kullback-Leibler distance D(tt \\tt) from the posterior obtained by pure MCMC 
to the PCE reformulated posterior, versus polynomial order 

this discrepancy in Fig. 5 in a quantitative assessment of the error in the posterior 
density of the PCE-based MCMC by computing the Kullback-Leibler divergence (KLD) 
of 7r K (posterior obtained by PCE/MCMC) from ir K (posterior obtained by pure MCMC), 



This is shown in Fig. 5, where one may observe a rapid rate of the convergence of the 
surrogate posterior tt k to the true posterior. 

5.1.2 Conductivity as random field 

The next example is a bit more realistic, by not assuming a priori that the truth is 
spatially constant. We repeat the previous analysis for the prior Kf being a random 
field — again for the sake of simplicity — assumed as Gaussian. The field Kf = qj is 
described by a covariance kernel exp(— r/l c ) with r being the distance and l c a correlation 
length. Furthermore, the mean and standard deviation are chosen in the same manner 
as before (see Section 5.1.1), while the truth is modelled in a more realistic way as one 
realisation of the prior field, as shown in Fig. 6. All other quantities such as measurement 
and measurement error stay the same as in the previous example. 

The numerical simulation of the conductivity field is performed by truncating the 
prior KLE (see beginning of this Section 5) to six modes, i.e. the random variables 
£i . . . & i* 1 Eq- (32) have to be updated. Similarly as before we use the pure MCMC 
update procedure as well as the PCE based MCMC method. The pure MCMC Bayesian 
update uses 100000 samples in order to assimilate the posterior variables, while the 
PCE approximation is taken to be of second order. In both cases the updated variables 
obtain the similar non-Gaussian form as shown in Fig. 7 b) and d). In addition, by the 
comparison of the bar-graphs of prior and posterior distributions in Fig. 7 b) and d) one 
may see the significant reduction of the prior variance. This is clearly visible in Fig. 7 c) 
where the probability density function of £5 is plotted. Through the identificaton or 
assimilation process one obtains from a very wide prior distribution a much narrower 
posterior as expected. Also, both methods give very similar results leading to the 
conclusion that the error introduced by PCE truncation to second order seems to be 
negligible in this example. 
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Figure 7: Distributions of particular random variables: (a) prior, (b) posterior obtained 
by pure MCMC, (c) pdfs of £5, (d) posterior obtained using PCE approximation of 
model response. In a), b), and c), the box graphs show for each RV (£1, ■■■■,£,&) on the 
ordinate the median, the central 25th percentile as a box, the central 75th percentile, 
and outliers. 
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5.1.3 Non-linear diffusion 



In the isotropic quasi-static diffusion equation given by Eq. (38), we now assume that 
the conductivity k depends on the temperature field u(x,u) through 



k(u, x, to) = kq(x, to) + kiu(x,lo) , 



(40) 



and the previously linear SPDE becomes nonlinear. Again, just like the prior assumption 
of a Gaussian conductivity before — chosen purely for reasons of simplicity and useful 
solely because the coefficient of variation was very small — is not consistent with the 
requirement by the second law of thermodynamics that the conductivity be positive, also 
this linear dependence of the conductivity can only be seen as a crude approximation, 
useful only for a small temperature range. Its only purpose is to show how the method 
works for a simple nonlinear state equation. 
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Figure 8: Distributions of particular random variables: (a) a priori, (b) a posteriori 
obtained by direct evaluation of the FE model, (c) probability density functions of £5, 
(d) a posteriori obtained using PC approximation of model response. Box-graphs as in 
Fig. 7. 



This very simple non-linear constitutive law is described by two artificial parameters 
hq and k±, where ki = 0.05 WuMK -2 is assumed as known and hence is certain, and 
kq is a priori taken as a Gaussian random field as in the previous Section 5.1.2 with a 
six term KLE with mean E (kq) = 2 WnMK^ 1 and standard deviation 0.3 Wm - K -1 . 
The truth is taken as one sample of this random field. The rest of the experimental set 
up is taken from the previous tests (see the previous Section 5.1.1 and 5.1.2). 

Similarly as before, the MCMC method in its pure sampling and PCE based form 
is used to calculate the pdfs of the conductivity parameter n\. The update procedure 
consists of assimilation of the six random variables £1, ... ,^6 °f the truncated KLE of 
the prior field qf = Kf with the help of the temperature measurements in all points of 
the finite element mesh. Similarly as in the linear case, the box-graphs of the prior and 
posterior random variables in Fig. 8 a) and b) show significant reduction of the prior 
variance and a change of the mean, which causes the non-Gaussian form of the posterior 
random variables. Furthermore, the comparison of the pure and PCE based MCMC 
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procedure in Fig. 8 c) and d) shows that both methods give approximately the same 
results, even though the PCE approximation is only of second order. 

We see that the MCMC based Bayesian update works also for random fields, even 
though the measurement (here the temperature) is not linear in the quantity to be 
identified (the conductivity), and also works in case the state equation is nonlinear. 
The inconsistent shape of the prior distribution (allowing negative conductivities with 
non-zero probability) is corrected in all the updates to a form where practically all the 
probability weight is in the positive range. The low order truncated PCE based MCMC 
performs almost just as well and is computationally much cheaper — this is practically a 
surrogate model. 

5.2 Examples with linear Bayesian updating 

The examples shown up to now have used the so called 'full' Bayesian update based on 
Eq. (4) or Eq. (5), i.e. by updating the measure. We now turn our attention to updates 
which change the random variable, based on Eq. (23). The first example will also be a 
diffusion equation, although on a different domain, and the second example will be an 
elastic-plastic system, hence non-smooth and strongly nonlinear. 

In previous three examples we have considered the conductivity a priori as a normally 
distributed random field, which ignores the positivity requirement as already noted. In 
order to have an a priori model which takes care of this requirement in a way that 
our variables are free of constraints — the importance of this was already stressed in 
Subsection 2.2 — we model the logarithm of the conductivity field; the a priori field is 
taken as a lognormal random field, according to maximum entropy considerations this 
is the right choice if one only presumes that the field has a finite variance. This makes 
the identification problem harder, as the logarithm resp. exponential is additionally 
involved. 

As already pointed out in Subsection 2.2, this corresponds with [1] considering the 
positive cone in the space of all fields as a differentiable Riemann manifold, and more 
specifically also as a Lie group. The logarithm and exponential are then the equally 
named maps from Lie theory, carrying the tangent space at the identity into the corres- 
ponding Lie algebra and vice versa. As the manifold can be equipped with a Riemann 
metric which is carried from the Lie algebra via the exponential map to the tangent space 
of the identity, and from there via the group operations to any other place, distances on 
the manifold can be measured as path lengths along geodesies. This turns the manifold 
into a metric space and would allow to use the notions of Frechet-type conditional ex- 
pectations as alluded to in Subsection 2.2. But geodesies correspond uniquely to straight 
lines in the Lie algebra, hence to their direction vectors without any constraints [1]. It 
is in this space that we propose to do all the operations. 

5.2.1 Diffusion with linear Bayes updates 

We consider the same Eq. (38) as before, but now on an L-shaped domain Q with 
specified homogenous Dirichlet and Neumann boundary conditions. This makes the 
heat flow a bit more complicated than in the previous example, where inhomogeneities 
in the heat flow arise solely due conductivity variations, whereas here they are also 
caused through the geometry. The external loading is defined as a sinusoidal function 
/ = /o sm(j^x T v + ip), with /o being the amplitude, A the wave-length, ip £ [0, 2ir] the 
phase, and v = (cos a, sin a) the direction of the sinusoidal wave specified by an angle 
a G [— 7r/2, 7r/2]. For a more detailed description, please see [35]. 

The identification of the true conductivity field Kt is done with the help of the linear 
Bayesian method in its direct PCE and MC sampling (EnKF) form. Additionally, the 
update process is realised in a sequential way as shown in Fig. 9 by repeating the 
measurements in the same experimental conditions for different values of the right hand 
side. In other words one takes for the prior in the next update the posterior from the 
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previous update and does the new measurement according to a new value of /. The 
right hand side values are altered by variation of the appropriate parameter values, such 
as the wave length A, the phase ip, etc. In this way different regions of the domain are 
stimulated, aiding in the identification process. 

The forward problem is solved within the finite element framework by discretising 
the spatial domain into 1032 triangular elements. The measurements y(u) of the state 
quantity u, 

y(u,uj) :=[... ,y(xj),...]<E R L , y( Xj ) = u(x,uj)dx, (41) 

Jg j 

are obtained on little patches Qj C Q centred at the finite element node Xj £ Q = 
{x%, ...,xb}- The measurements are performed in only 10% of the total number of 
mesh nodes, equally distributed over the computational area. The measured values are 
disturbed by a centred Gaussian noise with the diagonal covariance (T 2 I, where a £ is 
approximately equal to 1% of the measured value. 

For the 'truth' we take one realisation of a lognormal random field sampled from the 
modified lognormal distribution Kt ■= 2 + Kb(x,uj), where Kb represents a homogenous 
lognormal random field with E (Kb) = 1 and standard deviation a Kb = 0.2236 obtained as 
the exponential transformation of a Gaussian random field qb described by an exponential 
covariance function exp(— r/l c ), with r being the spatial distance and l c = 0.5 the 
correlation length. 

Similarly to this, the prior is chosen to be a homogeneous lognormal field Kf = 
ex.p(qj) with E(/tj) = 2.4 and standard deviation a Kf = 0.8944. Its underlying Gaus- 
sian random field is described by a Gaussian covariance function exp(— r 2 /l 2 c ) with a 
correlation length l c = 2, which significantly distinguishes Kf from Kt, see Fig. 10 a) and 
b). For the PCE computations we used polynomials with total order up to p = 3 and 
M = 10 KLE modes, while for the EnKF we chose 290 ensemble members, as this has 
approxmately the same workload as the PCE. 

In Fig. 10 c) and d) the true field is compared with the first and the fourth sequential 
posterior update obtained by the PCE based method. Since the prior is very different 
from the truth by all its properties the 'shapes' of the first update and the true field 
are still relatively distinct. However, the mean and the variance of the prior are clearly 
moved in the direction of the truth. Doing further updates with different loadings, the 
posterior field approaches the truth although some residual uncertainty remains. This 
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Update 


PCE 

e m e a var (««,) 


EnKF 

e m e a var (/c a ) 


A priori 
1st update 
2nd update 
3rd update 
4th update 


0.53 0.42 0.8 
0.18 0.14 0.13 
0.07 0.06 0.07 
0.06 0.04 0.07 
0.03 0.03 0.07 


0.53 0.42 0.8 
0.19 0.15 0.08 
0.02 0.02 0.001 
0.02 0.02 5.38e-04 
0.02 0.02 3.58e-04 



Table 1: Comparison of relative error of the mode (e m ), relative error of the mean (e a ), 
and the variance (var(K a )) of the pdf in one point of the domain obtained by the direct 
(PCE) and the ensemble Kalman filter (EnKF) method. 
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behaviour is summarised in Table 1, where the PCE and EnKF results are shown for 
one point in the domain. Here one may notice that the EnKF method results in an 
unrealistically small variance (smaller than 10 -2 already in the second update), such 
that the truth is in a very low probability region. This undesirable behaviour, which 
gives such overconfident estimates of the residual uncertainty is one of the problems of 
the EnKF, and one has to take care not to be misled. 



Q 
Q_ 



— prior 

— 1 st update 
^-2nd update 
_ _ 3rd update 
— 4th update 
X truth 




Figure 11: Probability density function (pdf) of the posterior for four updates by the 
direct (PCE) method. 



In a similar vein, comparing the posterior pdfs of the EnKF and the PCE update 
in Fig. 11 and Fig. 12, one may observe that after first update the posteriors are fairly 
similar. However, in further updates (from second to fourth), the EnKF appears to be 
very certain about the posterior value, although the truth stays in a very low probability 
region. In contrast to this, the PCE update is able to keep a credible residual variance 
such that the truth stays inside a high probability area. 

5.2.2 Elasto-plasticity with linear Bayesian update 

To describe quasi-static elasto-plasticity with hardening, we are lead a bit beyond the 
format in Eq. (1) as one has to consider the non-smooth evolution of the internal para- 
meters [25, 33, 34], which gives a variational inequality as a generalisation of the differ- 
ential equation. The state variable is u = (v,ep,f), where v denotes the displacement 
field, e p is the plastic deformation, and v the appropriate internal hardening variable. 
In a mixed formulation [33, 34] one has to consider at the same time a dual variable u* 
of stress-like quantities. These quantities have to stay inside a non-empty closed convex 
set J^, the so-called elastic domain. The abstract mixed formulation then is to find 
functions u(t) and u*(t) such that u*(t) S Jtf and 

A(u(t))+u*(t) = f(t) 
Vz* G JT : ((u(t),z* - u*(t)}) < 0. (42) 

In this description, the first equation represents the equilibrium condition, while the 
second is the so-called flow rule, stating that the rate of change u(t) = (d/dt)u(t) is in 
the normal cone of Jff at u*(t); for more details see [25, 33, 34]. 

Note that while u*(t) is not on the boundary of Jf, the stress-strain relationship 
stays linear and the model reduces to a linear equation mathematically similar to the 
diffusion equation, with n replaced by a constitutive tensor described by the bulk K 
and shear modulus G. Here we investigate the identification of the shear modulus G via 
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Figure 12: Probability density function (pdf) of the posterior for four updates by the 
EnKF method. 




Figure 13: Geometry and loading of Cook's membrane test 
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e) f) 



Figure 14: Shear modulus G via elastic response: a) truth, b) prior, c) first update, 
(linear elastic range), d) fourth update (linear elastic range), e) first update (nonlinear 
elasto-plastic range) f) fourth update (nonlinear elasto-plastic range). 
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Model 


Update 




A priori 1st 2nd 3rd 4th 5th 6th 


Elastic 


0.45 0.15 0.04 0.02 0.01 0.01 0.01 


Elasto-Plastic 


0.45 0.30 0.21 0.15 0.08 0.03 0.02 



Table 2: The comparison of the relative root mean square error e a in each update for 
purely elastic and elasto-plastic response. 



the linear sequential Bayesian procedure and shear stress measurements. The numerical 
tests are performed on a well-known and often used example. 

Cook's membrane is clamped on one end and loaded by a shear force in the ver- 
tical direction at the other end as shown in Fig. 13. The plate is discretised into 225 
quadrilateral quadratic eight-noded serendipity finite elements (see [40]), of which 30% 
are chosen to place the measurements. For the sake of simplicity we do not investigate 
which nodes are the most appropriate. Instead, the measurement points are equally 
distributed over the domain, and the measurements are polluted by a measurement er- 
ror modelled by independent Gaussian noise with zero mean with a diagonal covariance 
ex 2 /, and a £ approximately equal to 1% of the measured value. 

a a 




In this particular example the measurement represents the shear stress a xy numer- 
ically computed in a virtual experiment with the adopted 'true' value of Gt- The truth 
Gt is taken to be one realisation of the modified lognormal field Gt = Gq + G\kg, 
where k g := exp(q t ), G = 82760 MPa, G% = 0.1 MPa, E (kg) = G and standard 
deviation okq = O.lGo- The underlying Gaussian random field qt is described by a 
Gaussian covariance function exp(— r 2 // 2 ), with r being the spatial distance and l c = 30 
the correlation length. 

Similarly as for the previous diffusion example in Section 5.2.1, the shear modulus 
has to be positive and the prior distribution is hence assumed to be lognormal, though 
with different characteristics than for the truth. Namely, the prior is assumed to be the 
lognormal random field Gf = exp(qj) with E (Gt) = 107590 MPa, a standard deviation 
of 0.4Go, and an underlying covariance function of qf given by exp(— r/l c ) with l c = 20. 
In this way the realisation of the prior is very different from the realisation of the true 
field as shown in Fig. 14. 

We compute the direct linear Bayesian update via polynomial chaos expansion of 
order p = 3 and M = 10 random variables. The updated field, i.e. the posterior, is 
then adopted for the new prior in the next sequential update, see Fig. 9. Thereby one 
introduces new information into update process by changing the value of the right hand 
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Figure 16: Comparison of the posterior and prior pdfs over the updates: a) elastic 
(linear) response, b) elasto-plastic (nonlinear) response. 
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side. With each update the force decreases or increases in every second update for 20% 
starting from 1 MN/m. 

The sequential update is done in six stages using both purely elastic and elasto- 
plastic response. As expected, the nonlinearity is detrimental to the updating and the 
posterior obtained via elastic response is better than the one from an elasto-plastic model 
as can be seen in Table 2, where the relative root mean square error reduces faster with 
each update for purely elastic response. This is confirmed from the plots of the relative 
error over the domain. In Fig. 15 one may see that the elasto-plastic response has 10 
times bigger error than the elastic response. In addition, the error is similar in almost 
all parts of domain besides the region where the plastification starts to occur. 

Finally in Fig. 16 we compare the pdfs for different updates for both the elastic/linear 
and elasto-plastic (nonlinear) response. The overall picture is similar to what was ob- 
served before, namely that the severe nonlinearity in the elasto-plastic model makes the 
identification process much harder. 

6 Conclusion 

The problem of identifying parameters or quantities in a computational model by com- 
parison with either real world measurements or other computational models (e.g. more 
refined models) is an old and frequent one. Practically all approaches start from the idea 
that the choice of parameters should be such as to minimise a certain error functional. 
Classical methods to do this lead to regularisation. Another point of view — taken here — 
is to embed the unknown quantity in a probability distribution, where the spread of the 
probability distribution should reflect the uncertainty about that quantity. 

We have, starting from the elementary textbook formulation of Bayes's theorem, 
shown how it connects to conditional expectation. Conditional expectation has the 
minimisation of variance at its background. This in turn gives rise to a computational 
characterisation of updates. By approximating the space of all measurable functions 
through the subspace of linear functions, we disregard a certain amount of information, 
but we are rewarded with a simple computation. Our result contains the well-known 
Kalman filter as a special case. 

We then discuss how these theoretical constructs can be implemented in a real com- 
putation. The first group is based on the classical formula for measures and probability 
densities. We sketch the Markov chain Monte Carlo (MCMC) methods, which may 
be used in this case. We also discuss how the Monte Carlo sampling may be acceler- 
ated through the use of functional approximations for the stochastic part. We here use 
standard Hermite polynomials in Gaussian RVs, i.e. Wiener's polynomial chaos. 

The second group is based on the conditional expectation idea, leaving the under- 
lying measure unchanged and updating the relevant RV. We show how this idea may 
be implemented in a sampling Monte Carlo manner — i.e. the ensemble Kalman filter 
(EnKF) — or alternatively in the PCE setting. We then demonstrate the workings of 
all these methods on some simple examples of slowly increasing difficulty. We find that 
MCMC needs a huge number of samples, EnKF needs considerably less. On the other 
hand the variance estimate of EnKF seems to be often erring on the optimistic side, 
thus severely underestimating the residual uncertainty. The PCE based methods on the 
other hand do not seem to suffer from this illness and give much more reliable results 
at a comparable workload. Our final identification problem is a very difficult one, an 
elasto-plastic system, or mathematically speaking a variational inequality. The non- 
smoothness inherent in such problems slows also the PCE based methods down, but 
they still succeed. In our view these PCE based methods show great promise for these 
taxing parameter identification problems. 
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